Data Cleansing: Beyond Integrity Analysis
نویسندگان
چکیده
The paper analyzes the problem of data cleansing and automatically identifying potential errors in data sets. An overview of the diminutive amount of existing literature concerning data cleansing is given. Methods for error detection that go beyond integrity analysis are reviewed and presented. The applicable methods include: statistical outlier detection, pattern matching, clustering, and data mining techniques. Some brief results supporting the use of such methods are given. The future research directions necessary to address the data cleansing problem are discussed.
منابع مشابه
The Practices, Perceptions, and Beliefs of Traditional Birth Attendants Regarding Early Breastfeeding Initiation in Zimbabwe: A Qualitative Study
Background & aim: Early breastfeeding initiation (EBFI) defined as giving breast milk within the first hours following birth, which is recommended as a simple strategy for the enhancement of neonatal health and survival. This descriptive qualitative study was conducted to explore the practices, perceptions and beliefs of renowned traditional birth attendants (TBA) regarding EBFI in Chipinge rur...
متن کاملCleansing and preparation of data for statistical analysis: A step necessary in oral health sciences research
In many published articles, there is still no mention of quality control processes, which might be an indication of the insufficient importance the researchers attach to undertaking or reporting such processes. However, quality control of data is one of the most important steps in research projects. Lack of sufficient attention to quality control of data might have a detrimental effect on the r...
متن کاملReducing the Risk of Insider Misuse by Revising Identity Management and User Account Data
To avoid insider computer misuse, identity and authorization data referring to the legitimate users of the systems must be properly organized and constantly and systematically analyzed and evaluated. In order to support this, a methodology for structured Identity Management has been developed. This methodology includes gathering of identity data spread among different applications, systematic c...
متن کاملAnalysis of Data Cleansing Approaches regarding Dirty Data - A Comparative Study
Data Cleansing is an activity involving a process of detecting and correcting the errors and inconsistencies in data warehouse. It deals with identification of corrupt and duplicate data inherent in the data sets of a data warehouse to enhance the quality of data. The research was directed at investigating some existing approaches and frameworks to data cleansing. That attempted to solve the da...
متن کاملData Cleansing - A Prelude to Knowledge Discovery
This chapter analyzes the problem of data cleansing and the identification of potential errors in data sets. The differing views of data cleansing are surveyed and reviewed and a brief overview of existing data cleansing tools is given. A general framework of the data cleansing process is presented as well as a set of general methods that can be used to address the problem. The applicable metho...
متن کامل